BONN: Bayesian Optimized Binary Neural Network

67

TABLE 3.3

Performance contributions of the components in RBCNs

on CIFAR100, where Bi=Bi-Real Net, R=RBConv,

G=GAN, and B=update strategy.

Kernel Stage

Bi

R

R+G

R+G+B

RBCN

32-32-64-128

54.92

56.54

59.13

61.64

RBCN

32-64-128-256

63.11

63.49

64.93

65.38

RBCN

64-64-128-256

63.81

64.13

65.02

66.27

Note: The numbers in bold represent the best results.

3) We further improve RBCNs by updating the BN layers with W and C fixed after

each epoch (line 17 in Algorithm 13). This further increases our accuracy by 2.51% (61.64%

vs. 59.13%) in CIFAR100 with 32-32-64-128.

3.7

BONN: Bayesian Optimized Binary Neural Network

First, we briefly introduce Bayesian learning. Bayesian learning is a paradigm for construct-

ing statistical models based on the Bayes Theorem, providing practical learning algorithms

and helping us understand other learning algorithms. Bayesian learning shows its signifi-

1

2

3

4

FIGURE 3.19

The evolution of the prior p(x), the distribution of the observation y, and the posterior

p(x|y) during learning, where x is the latent variable representing the full-precision param-

eters and y is the quantization error. Initially, the parameters x are initialized according

to a single-mode Gaussian distribution. When our learning algorithm converges, the ideal

case is that (i) p(y) becomes a Gaussian distribution N(0, ν), which corresponds to the

minimum reconstruction error, and (ii) p(x|y) = p(x) is a Gaussian mixture distribution

with two modes where the binarized values ˆx andˆx are located.